Under the supervision of Blaise Hanczar, Farida Zehraoui and Franck Augé
2024-06-12
How to reduce the number of parameters ?
How to compute interaction specific to each patient ?
How to pick relevant information from input data \(X = [x_i]_{1 \leq i \leq L}\) ?
How to apply self-attention to large vectors ?
Group related features together and apply self-attention on it.
Grouping Strategies
\[ \begin{align} X_G &= \mathcal{T}\left(X\right) \\ &= \left[X_{g_1}, \cdots, X_{g_4}\right] \end{align} \]
Intra-group interactions
Gene grouping creates unwanted restrictions
Genes in group \(g_1\) cannot interact with genes from other groups
Restore group interactions with the Attention mechanism.
\[ X'_{g_i} = \operatorname{FCN}\left(X_{g_i}\right) \]
\[ X'_G = \left[X'_{g_1}, \cdots, X'_{g_4}\right] \]
Inter-groups interactions
New representation of each group taking into consideration features from other groups
\[ \begin{align} \color{query}q_i &= \color{query}X'_{g_i} \cdot W_q^h \\ \color{key}k_i &= \color{key}X'_{g_i} \cdot W_k^h \\ \color{value}v_i &=\color{value} X'_{g_i} \cdot W_v^h \end{align} \]
\[ \begin{align} Z &= \operatorname{MultiHeadAttention}\left(X'_G\right) \\ &= \operatorname{concat}\left(\left[h_1, \cdots, h_H \right]\right) \\ h_i &= \operatorname{Attention}\left({\color{query}Q},{\color{key}K},{\color{value}V}\right) \end{align} \]
:::
:::
::::
Across cancers different interactions are learnt
Identified pathways:
Identified interactions:
Omics were analyzed individually but a phenotype results from their interaction
Combine the different omics in a single model.
Attention mechansim can capture interaction between two vectors
High dimensionnality
Attention complexity: \(\mathcal{O}(n^2)\)
\[\begin{align} Z_{\beta \rightarrow \alpha} &= \operatorname{CrossAtt}_{\beta \rightarrow \alpha}\left(X_{\alpha}, X_{\beta} \right) \\ &= \operatorname{Attention}\left(Q_{\alpha},K_{\beta},V_{\beta} \right) \end{align}\]
Consider all modality pairs: \(n^2\) pairs to consider
Only consider pairs known to interact
Layer-wise relevance propagation
Patient can be diagnosed efficiently but what are the disease drivers? How to treat the patient?
What are the important genes or potential biomarkers?
Whate are the actions that could lead a patient to a healthier state?
Counterfactuals
How would \(x\) change if \(y\) had been \(y^{\prime}\)?
\(y\) was predicted because input \(x\) had values \(\left(x_{1}, x_{2}, x_{3}, \ldots\right)\). If \(x\) instead had values \(x_{1}^{\prime}\) and \(x_{2}^{\prime}\) while other variables, \(\left(x_{3}, \ldots\right)\), had remain the same, \(y^{\prime}\) would have been predicted.
(Wachter et al., 2017)
\[ \operatorname*{argmin}_{x^{\text{CF}}} \mathcal{L}\left(g\left(x^{\text{CF}}\right), y^{\text{CF}} \right) + d\left(x^{\text{CF}}, x\right) \]
Is it sufficient to have realistic and actionnable points?
Data manifold closeness = respect the original data distribution
GANs captures the data distribution (Goodfellow et al., 2014)
\[ \mathcal{L} = {\color{manifold}\mathbb{E}_{x\sim p_{d}}\left[ D\left(x\right)\right] - \mathbb{E}_{x^{\text{CF}}\sim p_{g}}\left[ D\left(x^{\text{CF}}\right)\right] + \lambda \mathbb{E}_{\tilde{x}\sim p_{g}}\left[ {\left( {\left\|\nabla_{\tilde{x}}D\left(\tilde{x}\right) \right\|}_{2} -1 \right)}^{2}\right]} \\ + {\color{cf}\mathcal{L}_{\operatorname{Cl}} + \mathcal{L}_{\operatorname{Cl}_{T}}} + {\color{sparsity}\mathcal{L}_{\text{Reg}}\left(G\right)} \]
| \(L_1\) | 2440 |
| \(L_2\) | 30 |
| \(L_{\infty}\) | 1 |
| \(L_0\) | 0.52 |
| \(\mathcal{A}_{\text{kNN}}\) | 0 |
| \(\mathcal{A}_{\text{Oracle}}\) | 0.94 |
GDA: gene-disease association from DisGenet / COSMIC: Catalogue of somatic mutations in cancer
AttOmics: Applied self-attention mechanism to omics profile to capture patient-specific interactions. Self-attention was applied to groups of features which allowed the addition of knowledge in the groups.
Aurélien Beaude, Milad Rafiee Vahid, Franck Augé, Farida Zehraoui, and Blaise Hanczar. “AttOmics: attention-based architecture for diagnosis and prognosis from omics data.” In: Intelligent Systems for Molecular Biology (ISMB). Lyon, France, 2023.
CrossAttOmics: Integrate multi-omics data based on the known regulatory interactions between modalities and the cross-attention.
Aurélien Beaude, Franck Augé, Farida Zehraoui, and Blaise Hanczar. “CrossAttOmics: Multi-Omics data integration with CrossAttention.” In: Bioinformatics, Under Revision (2024).
CrossAttOmicsGate: Let the network score each interaction with a gating mechanism
Aurélien Beaude, Farida Zehraoui, Franck Augé, and Blaise Hanczar. “Interpretable deep learning for multimodal high dimensional omics data.” Under preparation (2024).
Counterfactual Generation: Find the perturbation on the molecular profile that will change the prediction from a disease state to a health
How to compute attention between scalar values ?
\[ A_{ij} = \operatorname{softmin}\left(\left|Q_{i} - K_{j} \right| \right) \]
Efficient implementation with Triton.
Knowledge is incomplete or may contains errors. How to handle this ?
Knowledge is iteratively constructed and omics measurement represent a mean of all occuring pathways.
What to do about unnannotated features ?
| Model | Accuracy | Precision | Recall | F1 |
|---|---|---|---|---|
| No Gate | 0.980 ± 0.001 | 0.982 ± 0.002 | 0.979 ± 0.002 | 0.980 ± 0.002 |
| Gate | 0.987 ± 0.001 | 0.989 ± 0.001 | 0.987 ± 0.001 | 0.987 ± 0.001 |
PhD defense - Aurélien BEAUDE